A tibble is a table – a two dimensional data structure with rows (observations) and columns (variables).

I’ll use the terms observations and rows interchangeably depending on the context. The same goes for the terms variables and columns. As you may recall the datasets pulse and survey were of type tibble. Each variable in a tibble has a fixed type such as character, double etc. Let’s start by creating a tibble manually.

Create a tibble

To create a tibble you need to make sure that the package tidyverse is installed and loaded. See installation for more details.

Enter the following to load tidyverse package:

library(tidyverse) 

Creating a tibble is done using the keyword tibble taking a sequence of name=value pairs where:

Take for example the variables name, year and colour to represent a person’s name, birth year and favourite colour:

favourite_colour  <- tibble(name=c("Lucas","Lotte","Noa","Wim"), 
                           year=c(1995,1995,1995,1994), 
                           colour=c("Blue","Green","Yellow","Purple"))

When creating a tibble the column vectors must be of the same length.

The variable favourite_colour now holds the data. Enter its name in the R Console for inspection:

favourite_colour
# A tibble: 4 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
3 Noa    1995 Yellow
4 Wim    1994 Purple

What additional pieces of information do you see beside the content we provided?

  1. ‘# A tibble: 4 x 3’, which says that this is a tibble with dimensions 4x3 (4 observations and 3 variables),

  2. the atomic type of each variable, in this case character and double,

  3. the row numbers


Inspect your data

Type the following to find out the dimensions of the tibble:

ncol(favourite_colour)  # number of variables (columns)
[1] 3
nrow(favourite_colour)  # number of observations (rows)
[1] 4
dim(favourite_colour)   # dimensions : 4 rows and 3 columns 
[1] 4 3

Head and tail

Show top and bottom rows of the tibble:

head(favourite_colour, 2)  # first 2 observations (rows)
# A tibble: 2 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
tail(favourite_colour, 3)  # last 3 observations (rows)
# A tibble: 3 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lotte  1995 Green 
2 Noa    1995 Yellow
3 Wim    1994 Purple

With the second argument to head and tail functions you can control the number of rows.

By default head and tail show 6 rows, i.e. when the second argument is omitted : head(favourite_colour) or tail(favourite_colour).

Select variables: [

Often you may need to select certain variables, this can be done using square brackets [ :

favourite_colour["colour"]
# A tibble: 4 x 1
  colour
  <chr> 
1 Blue  
2 Green 
3 Yellow
4 Purple

or combination of variables:

favourite_colour[c("name","year")]
# A tibble: 4 x 2
  name   year
  <chr> <dbl>
1 Lucas  1995
2 Lotte  1995
3 Noa    1995
4 Wim    1994

Subset result of a tibble is always a tibble.

Selection of variables can also be achieved with indices as we saw in vectors:

favourite_colour[2:3]
# A tibble: 4 x 2
   year colour
  <dbl> <chr> 
1  1995 Blue  
2  1995 Green 
3  1995 Yellow
4  1994 Purple
favourite_colour[c(1,3)]
# A tibble: 4 x 2
  name  colour
  <chr> <chr> 
1 Lucas Blue  
2 Lotte Green 
3 Noa   Yellow
4 Wim   Purple

To deselect use negative indices:

favourite_colour[-2]
# A tibble: 4 x 2
  name  colour
  <chr> <chr> 
1 Lucas Blue  
2 Lotte Green 
3 Noa   Yellow
4 Wim   Purple

Extract variables as vectors: [[ or $

If you want to work with variables as individual vectors then you can do this either by double square brackets or $ sign:

favourite_colour[["year"]]
[1] 1995 1995 1995 1994
favourite_colour$year
[1] 1995 1995 1995 1994

In some contexts (later in the course) it is convenient to use the function pull which does the same as [[ and $ :

pull(favourite_colour, year) 
[1] 1995 1995 1995 1994

tibble to/from file

Tibbles can be written to data files and read back again. Many data formats exist but for brevity we will be using comma-separated-value (csv) format in this course. The functions involved for this purpose are write_csv and read_csv (see data import cheat sheet).

let us now save our first tibble into a file in csv format:

write_csv(x = favourite_colour, file = "favourite_colour.csv")

favourite_colour tibble is written to favourite_colour.csv text file. You may inspect the file with any editor and it should look something like:

name,year,colour
Lucas,1995,Blue
Lotte,1995,Green
Noa,1995,Yellow
Wim,1994,Purple

This way we can permanently store our results in files for later use. We can now read the csv file back into a R environment variable, e.g. favourite_colour_csv :

favourite_colour_csv <- read_csv(file = "favourite_colour.csv")

── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  name = col_character(),
  year = col_double(),
  colour = col_character()
)

read_csv gives a summary of the variables and their inferred types.

favourite_colour_csv
# A tibble: 4 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
3 Noa    1995 Yellow
4 Wim    1994 Purple

Bind tibbles by rows and columns

Often we have different data sets with i) the same set of variables or ii) same set of observations but different variables which we would like to combine:

bind_rows

Take for example the following two data sets with common variables name,year and colour:

favourite_colour1  <- tibble(name=c("Lucas","Lotte","Noa","Wim"), 
                           year=c(1995,1995,1995,1994), 
                           colour=c("Blue","Green","Yellow","Purple"))
favourite_colour1
# A tibble: 4 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
3 Noa    1995 Yellow
4 Wim    1994 Purple
favourite_colour2  <- tibble(name=c("Raul", "Isaac"), 
                           year=c(1998,1998), 
                           colour=c("Red", "Green"))
favourite_colour2
# A tibble: 2 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Raul   1998 Red   
2 Isaac  1998 Green 

then the following gives the combined tibble:

favourite_colour1_2 <- bind_rows(favourite_colour1, favourite_colour2)
favourite_colour1_2
# A tibble: 6 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
3 Noa    1995 Yellow
4 Wim    1994 Purple
5 Raul   1998 Red   
6 Isaac  1998 Green 

The function bind_rows treats tibbles as a collection of unordered variables. Let’s take the same data as in favourite_colour2 but change the order of variables year and colour

favourite_colour3  <- tibble(name=c("Raul", "Isaac"), 
                           colour=c("Red", "Green"), 
                           year=c(1998,1998))
favourite_colour3
# A tibble: 2 x 3
  name  colour  year
  <chr> <chr>  <dbl>
1 Raul  Red     1998
2 Isaac Green   1998

then combining favourite_colour1 and favourite_colour3 will yield the same results:

favourite_colour1_3 <- bind_rows(favourite_colour1, favourite_colour3)
favourite_colour1_3
# A tibble: 6 x 3
  name   year colour
  <chr> <dbl> <chr> 
1 Lucas  1995 Blue  
2 Lotte  1995 Green 
3 Noa    1995 Yellow
4 Wim    1994 Purple
5 Raul   1998 Red   
6 Isaac  1998 Green 

What about bind_rows(favourite_colour3, favourite_colour1)?

The same dataset except the order of variables are taken from favourite_colour3, the first argument to bind_rows.


Another consequence of treating tibbles as a collection of unordered variables is that there is no restriction on the given variables in the tibbles, with other words they might be identical sets of variables as was shown in the examples above but not necessarily:

favourite_colour4  <- tibble(name=c("Raul", "Isaac"), 
                           colour=c("Red", "Green"), 
                           year=c(1998,1998), 
                           height= c(173, 179))
favourite_colour4
# A tibble: 2 x 4
  name  colour  year height
  <chr> <chr>  <dbl>  <dbl>
1 Raul  Red     1998    173
2 Isaac Green   1998    179
favourite_colour1_4 <- bind_rows(favourite_colour1, favourite_colour4)
favourite_colour1_4
# A tibble: 6 x 4
  name   year colour height
  <chr> <dbl> <chr>   <dbl>
1 Lucas  1995 Blue       NA
2 Lotte  1995 Green      NA
3 Noa    1995 Yellow     NA
4 Wim    1994 Purple     NA
5 Raul   1998 Red       173
6 Isaac  1998 Green     179

bind_cols

Let us now combine another tibble with variable ‘height’ for the same set of observations in favourite_colour1_2:

heights <- tibble(height=c(173, 179, 167, 181 , 173, 184))
heights
# A tibble: 6 x 1
  height
   <dbl>
1    173
2    179
3    167
4    181
5    173
6    184
bind_cols(favourite_colour1_2,heights)
# A tibble: 6 x 4
  name   year colour height
  <chr> <dbl> <chr>   <dbl>
1 Lucas  1995 Blue      173
2 Lotte  1995 Green     179
3 Noa    1995 Yellow    167
4 Wim    1994 Purple    181
5 Raul   1998 Red       173
6 Isaac  1998 Green     184

bind_cols expects the same number of obsersavations in each tibble, it is an error otherwise, and you as the user are responsible for the order of observations in each tibble.

Type of a tibble

Now inspect the type of the tibble we just created:

class(favourite_colour)
[1] "tbl_df"     "tbl"        "data.frame"

A tibble is in its core a ‘data.frame’, a base R data structure.

‘The types tbl_df’ and ‘tbl’ enforce additional convinient behaviours specific to tibble.



Copyright © 2021 Biomedical Data Sciences (BDS) | LUMC